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Observations or measurements taken of a quantum system (a small number of fundamental 
particles) are inherently random. If the state of the system depends on unknown parame- 
ters, then the distribution of the outcome depends on these parameters too, and statistical 
inference problems result. Often one has a choice of what measurement to take, corre- 
sponding to different experimental set-ups or settings of measurement apparatus. This 
leads to a design problem — which measurement is best for a given statistical problem. 
This paper gives an introduction to this field in the most simple of settings, that of esti- 
mating the state of a spin-half particle given n independent copies of the particle. We show 
how in some cases asymptotically optimal measurements can be constructed. Other cases 
present interesting open problems, connected to the fact that for some models, quantum 
Fisher information is in some sense non-additive. In physical terms, we have non-locality 
without entanglement. 
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1 Introduction 

The fields of quantum statistics and quantum probability have a reputation 
for being esoteric. However, in our opinion, quantum mechanics is a fasci- 
nating source of probabilistic and statistical models, unjustly little known 
to 'ordinary' statisticians and probabilists. 

Quantum mechanics has two main ingredients: one deterministic, one 
random. In isolation from the outside world a quantum system evolves de- 
terministically according to Schrodinger's equation. That is to say, it is de- 
scribed by a state or wave-function whose time evolution is the (reversible) 
solution of a differential equation. On the other hand when this system 
comes into interaction with the outside world, as when for instance mea- 
surements are made of it (photons are counted by a photo-detector, tracks 
of particles observed in a cloud chamber, etc.) something random and irre- 
versible takes place. The state of the system makes a random jump and the 
outside world in some way contains a record of the jump. From the state of 
the system at the time of the interaction one can read off, according to cer- 
tain rules, the probability distribution of the macroscopic outcomes and the 
new state of the system. (See Penrose, 1994, for an eloquent discussion of 
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why there is something paradoxical in the peaceful coexistence of these two 
principles; and see Percival (1998) for interesting stochastic modifications to 
Schrodinger's equation which might offer some reconciliation). 1 

Till recently most predictions made from quantum theory involved such 
large numbers of particles that the law of large numbers takes over and pre- 
dictions are deterministic. However technology is rapidly advancing to the 
situation that really small quantum systems can be manipulated and mea- 
sured (e.g., a single ion in a vacuum-chamber, or a small number of photons 
transmitted through an optical communication system) . Then the outcomes 
definitely are random. The fields of quantum computing, quantum commu- 
nication, and quantum cryptography are rapidly developing and depend on 
the ability to manipulate really small quantum systems. Theory and con- 
jecture are much further than experiment and technology, but the latter are 
following steadily. 

In this paper we will introduce as simply as possible the model of quan- 
tum statistics and consider the problem of how best to measure the state 
of an unknown spin-half system. We will survey some recent results, in 
particular, from joint work with O.E. Barndorff-Nielsen and with S. Mas- 
sar (Barndorff-Nielsen and Gill, 1998; Gill and Massar, 1998). This work 
has been concerned with the problem, posed by Peres and Wootters (1991): 
can more information be obtained about the common state of n identical 
quantum systems from a single measurement on the joint system formed 
by bringing the n systems together, or does it suffice to combine separate 
measurements on the separate systems? A useful tool for our studies is the 
quantum Cramer-Rao bound with its companion notion of quantum infor- 
mation, introduced by C.W. Helstrom in a sequence of papers in the sixties 
and later refined by among others A.S. Holevo. 

Quantum statistics mainly consists of exact results in various rather spe- 
cial models, see the books of Helstrom (1976) and Holevo (1982). Just as in 
ordinary statistics, the Cramer- Rao bound on the variance of an unbiased 
estimator is rarely achieved exactly (only in so-called quantum exponential 
models). In any case, one would not want in practice to restrict attention 
to unbiased estimators only. There are results on optimal invariant meth- 
ods, but again, not many models have the structure that these results are 
applicable and even then the restriction to invariant statistical methods is 
not entirely compelling. 

One might hope that asymptotically it would be possible to achieve the 
Cramer-Rao bound. However asymptotic theory is so far very little devel- 
oped in the theory of quantum statistics, one reason being that the powerful 

1 Also highly recommended: Sheldon Goldstein, 'Quantum mechanics without ob- 
servers', Physics Today, March, April 1998; letters to the editor, Physics Today, February 
1999. 
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modern tools of asymptotic statistics (contiguity, local asymptotic normality, 
and so on) are just not available 2 since even if we are considering measure- 
ments of n identical quantum systems, there is no a priori reason to suppose 
that a particular sequence of measurements on n quantum systems together 
will satisfy these conditions. Here, we make a little progress through use 
of the van Trees inequality (see Gill and Levit, 1995), a Bayesian Cramer- 
Rao bound, which will allow us to make asymptotic optimality statements 
without assuming or proving local asymptotic normality. Another useful 
ingredient will be the recent derivation of the quantum Cramer-Rao bound 
by Braunstein and Caves (1994), linking quantum information to classical 
expected Fisher information in a particularly neat way. 

We will show that for certain problems, a new Cramer-Rao type inequal- 
ity of Gill and Massar (1998) does provide an asymptotically achievable 
bound to the quality of an estimator of unknown parameters. For some 
other problems the issue remains largely open and we identify situations 
where Peres and Wootter's question has an affirmative answer: there can 
be appreciably more information in a joint measurement of several parti- 
cles than in combining separate measurements on separate particles. This 
clarifies an earlier affirmative answer of Massar and Popescu (1995), which 
turned out only for small samples to improve on separate measurements. It 
also clarifies the recent findings of Vidal et al. (1998). 

Helstrom wrote in the epilogue to his (1976) book: 11 Mathematical statis- 
ticians are concerned with asymptotic properties of estimators. When the pa- 
rameters of a quantum density operator are estimated on the basis of many 
independent observations, how does the accuracy of the estimates depend on 
the number of the observations as that number grows very large? Under what 
conditions have the estimators asymptotically normal distributions? Prob- 
lems such as these, and still others that doubtless will occur to physicists and 
mathematicians, remain to be solved within the framework of the quantum 
mechanical theory." More than twenty years later this programme is still 
hardly touched (some of the few contributions are by Brody and Hughston 
(1998) and earlier papers, and Holevo (1983)) but we feel we have made a 
start here. 

In 20 ± e pages (even when ±e = +10) it is difficult to give a complete 
introduction to the topic, as well as a clear picture of recent results. The 
classic books by Helstrom and Holevo mentioned above are still the only 
books on quantum statistics and they are very difficult indeed to read for 
a beginner. A useful resource is the survey paper by Malley and Hornstein 
(1993). However the latter authors, as many others, take the stance that 
the randomness occuring in quantum physics cannot be caught in a standard 
Kolmogorovian framework. We argue elsewhere (Gill, 1998), in a critique of 
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an otherwise excellent introduction to the related field of quantum probabil- 
ity (Kiimmerer and Maassen, 1998), that this is nonsense. With more space 
at our disposal we would have included extensive worked examples; however 
they have been replaced by exercises so that the reader can supply some of 
the extra pages (but — unless you are Willem van Zwet — leave the starred 
exercises for later). 

Some references which we found specially useful in getting to grasps with 
the mathematical modelling of quantum phenomena are the books by Peres 
(1995), and Isham (1995). To get into quantum probability, we recommend 
Biane (1995) or Meyer (1986). 

This introductory section continues with three subsections summarizing 
the basic theory: first the mathematical model of states and measurements; 
secondly the basic facts about the most simple model, namely of a two-state 
system; and thirdly the basic quantum Cramer-Rao bound. That third sub- 
section finishes with a glimpse of how one might do asymptotically optimal 
estimation in one-parameter models: in a preliminary stage obtain a rough 
estimate of the parameter from a small number of our n particles. Estimate 
the so-called quantum score at this point, and then go on to measure it in 
the second stage on the remaining particles. Section 2 states a recent new 
version of the quantum Cramer-Rao bound which makes precise how one 
might trade information between different components of a parameter vec- 
tor. Section 3 outlines the procedure for asymptotically optimal estimation 
of more than one parameter, again a two-stage procedure. This is work 'in 
progress', so some results are conjectural, imprecise, or improvable. In a 
final short section we try to explain how some of our results are connected 
to the strange phenomenon of non-locality without entanglement, a hot topic 
in the theory of quantum information and computation. 

1.1 The basic set-up 

Quantum statistics has two basic building blocks: the mathematical speci- 
fication of the state of a quantum system, to be denoted by p = p{9) as it 
possibly depends on an unknown parameter 6, and the mathematical specifi- 
cation of the measurement, denoted by M, to be carried out on that system. 
We will give the recipe for the probability distribution of the observable 
outcome (a value £ of a random variable X say) when measurement M is 
carried out on a system in state p. Since the state p depends on an unknown 
parameter 0, the distribution of X depends on too, thereby setting a sta- 
tistical problem of how best to estimate or test the value of 9. Since we may 
in practice have a choice of which measurement M to take, we have a design 
problem of choosing the best measurement for our purposes. (There is also 
a recipe for the state of the system after measurement, depending on the 
outcome, but we do not need it here; Bennett et al., 1998). 
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For simplicity we restrict attention to finite-dimensional quantum sys- 
tems. The state of a d-dimensional quantum system will be modelled or spec- 
ified by a d x d complex matrix p called the density matrix of the system. For 
instance, when we measure the spin of an electron in a particular direction 
only two different values can occur, conventionally called 'up' and 'down'. 
One could call this a two-state system, we need a d =two-dimensional state 
space. Similarly if we measure whether a photon is polarized in a particular 
direction by passing it through a polarization filter, it either passes or does 
not pass the filter. Again, polarization measurements on a single photon can 
be discussed in terms of a two-dimensional system. If we consider the spins 
of n electrons, then 2 n different outcomes are possible and the system of n 
electrons together (or rather, their spins), is described by a d x d matrix p 
with d = 2 n . 

Definition 1.1 (Density matrix) The density matrix p of a d- dimension- 
al quantum system is a d x d self-adjoint, nonnegative matrix of trace 1. 

'Self adjoint' means that p* = p where the * denotes the complex conjugate 
and transpose of the matrix. That p is nonnegative means that if)* pip > 
for all column vectors ip (since p is self- adjoint this quadratic form is a real 
number). We often use the Dirac bra-ket notation whereby \ip) (called a 
ket) is written for the column vector ip and (ip\ (a bra) is written for its 
adjoint, the row vector containing the complex conjugates of its elements. 
The quadratic form ip* pip is then denoted (ip \ p \ ip). 

It follows that the diagonal elements of a density matrix are nonnegative 
reals adding up to one. Moreover by the eigenvalue-eigenvector decompo- 
sition of self-adjoint matrices we can write p = ^2,iPi \i) (i\ where the kets 
\i) are the orthonormal eigenvectors of p, (i \ j) = 5ij, and the pi are the 
eigenvalues: nonnegative real numbers adding up to one. One says that the 
density matrix p represents the mixed state obtained by taking with proba- 
bility pi the system in the pure state \i). The state vector of a pure state is 
also called a wave-function. 

Definition 1.2 (Measurement) A measurement M on a d- dimensional 
quantum system taking values x in a measurable space (X, A) is specified 
by an operator-valued probability measure or oprom for short, that is, a 
collection of self-adjoint matrices M(A) : A £ A such that 

1. M(X) = 1, the identity matrix, 

2. Each M(A) is non-negative. 

3. For disjoint A i; M(UjA) = Y.i M ( A i)- 
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Note that these three rules are the ordinary axioms of a probability measure 
on (X,A), except that the measure takes values in the self-adjoint matrices 
instead of the real numbers. The sample space X might be the real num- 
bers or a subset thereof, with the Borel sigma algebra, but it could also be 
anything else. 

Measurements are often called generalised measurements, to contrast 
them with a special subclass of measurements called simple measurements 
which we will introduce in a moment. In the literature the abbreviations 
'povm' (positive operator valued measure) and 'pom' (probability operator 
matrices) are often used, which we however find inaccurate. 

Now we can give the so-called trace-rule telling us the probability distri- 
bution of the random outcome X when M is used to measure p: 

Definition 1.3 (trace rule) The probability distribution of the outcome X 
is given by 

(1) Pr{X eA}= trace (pM (.4)), A G A 

Exercise 1.1 (legitimacy of trace rule) Prove that (1) indeeds defines 
a probability measure on X,A. 

One can argue from basic principles of quantum mechanics that however 
one measures a quantum system, the result must be an affine mapping from 
density matrices to the space of probability distributions on the outcome 
space. It is a theorem that any such mapping can be represented by an 
oprom. Thus the class of oproms contains all conceivable measurements. 
On the other hand, as we will see later, any oprom can be realised by some 
concrete experimental set-up, at least in principle, so the definition captures 
exactly what it should. 

A special kind of measurements plays a key role in theory and practice, 
these are the so-called simple measurements defined as follows: 

Definition 1.4 (Simple measurement) A simple measurement H on a 
d-dimensional quantum system taking values x in a measurable space (X, A) 
is a measurement such that each H(A) is idempotent, i.e., is a projector onto 
a subspace ofC d . 

It follows that the measurement takes on at most d different values, 
i.e., there exist x\,. . . ,Xk € X with k < d such that n({xi, . . . , x/J) = 1. 
Writing Il(arj) as abbreviation for n({xj}) the matrices II (a; j) project onto k 
orthogonal subspaces of C d together spanning the whole space. Let us now 
define a self-adjoint matrix X (not to be confused with the random variable 
X representing the outcome of the measurement) by X = J2i xjl{xi). Then 
the Xi are the eigenvalues of X and the II (xj) project onto the eigenspaces. 
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Conversely, given a self-adjoint matrix X one can construct a corresponding 
simple measurement or projector- valued probability measure. In this role 
we call X an observable. It follows that the expected value of the outcome 
of a measurement of X is given by trace(pX). For an ordinary real function 
/ (e.g., square, inverse, logarithm,. . . ) one defines the same function of 
the observable X by f(X) = J2i f( x i)^( x i), and the expected value of the 
outcome of a measurement of the observable f(X) is trace (pf(X)). 

Simple measurements are often called von Neumann measurements. We 
will occasionally use the term 'proprom' (projector-valued probability mea- 
sure). Physicists generally agree that any simple measurement could in 
principle be implemented in practice. 

'Between measurements' a quantum system evolves deterministically ac- 
cording to the famous Schrodinger equation, a differential equation for the 
component pure states \i) of a given mixed system. One thinks of a measure- 
ment as taking place instantaneously. After the measurement, the quantum 
system jumps to a new state (depending on the outcome x); this is called 
'the collapse of the wave function'. Again some simple rules specify what 
happens, but we will not give them here. 

If we bring two separate quantum systems together into some kind of in- 
teraction then their future evolutions will be linked together. Measurements 
can be made on the 'joint system', including all the separate measurements 
on each of the separate systems but many more besides. Mathematically 
this is modelled as follows: 

Definition 1.5 (product system) Consider two quantum systems, of di- 
mension d and d' , in states p and p' respectively. Together the two are in 
the state p (g> p' in C d <8> C d = C dxd where ® denotes the tensor product (of 
matrices, vectors, or spaces as appropriate). 

For the reader who is not familiar with tensor products, the tensor prod- 
uct of C d with C d ' has as basis the tensor product of each element of a basis 
of C d with each element of a basis ofC d '. One can take linear combinations 
of tensor products if)®if)' by expanding bilinearly in chosen basis' of the two 
spaces. Tensor products of matrices are defined in the natural way by how 
they operate on products of vectors: X ® X' ip ® ip' = Xip <S> X'ip'. The 
trace of a tensor product of two matrices is a product of the traces. 

Suppose M and M' are measurements on two separate quantum systems 
p and p'. Then we can define a joint measurement M ®M' on the combined 
system in the obvious way, taking values in the product of the outcome 
spaces of M and M' . 

Exercise 1.2 (product measurement) Show that the outcome of mea- 
surement of M<g>M' on a system in state p®p' is distributed as independent 
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realisations of measurement of M and M' on p and p' respectively. 

However the important point is that bringing two quantum systems together 
allows many more measurements than just product measurements (which as 
we saw from exercise 1.2 are not very interesting). 

Product systems are important for two main reasons. Firstly, one of the 
main themes of this paper is going to be: if we have n independent systems 
each in the same state p(9) (i.e., in identical states all depending on the same 
unknown parameter 9), can we learn more about 9 from a joint measurement 
on the d n dimensional combined system p® n (9)l In the next section we will 
discuss some of the history and other background to this question, which 
has been the subject of a series of papers in recent years. Secondly, product 
systems play a role in the realisation of generalised measurements. It is a 
theorem (due to Naimark) that any generalised measurement whatever can 
be realised by a simple measurement after a 'quantum randomisation'. That 
is to say, given any measurement M there exists a so-called ancillary system 
in state p' and a simple measurement II on the joint system p (g> p' such that 
tr&ce(pM(A)) = trace(p <8> p' 11(A)) for all A and whatever p. 

1.2 Spin half 

In order to make the above rather abstract concepts a little more concrete, let 
us go to the most simple special case, d = 2. This is the appropriate set-up 
for studying spin-half systems like the electron. We will see that we can asso- 
ciate the state of a spin-half system with a real vector a of length less than or 
equal to 1 in ordinary three dimensional space, and a simple measurement — 
which can take on at most two different values — with a direction in space, 
or a unit vector u. The trace rule (1) will reduce to a very simple formula 
involving a and u. The model applies to the famous Stern-Gerlach exper- 
iment, featuring in many introductory textbooks on quantum physics. In 
that experiment silver atoms were made to pass through a strongly varying 
magnetic field, having a certain direction. Each atom was either deflected 
upwards or downwards with respect to the direction of the field. The deflec- 
tion is due to the spin of the outermost electron in the silver atom, which can 
be characterized by a vector a. The orientation of the magnet determines 
which measurement is being taken, i.e., the value of u. 

First we take some time to study some special features of the 2x2 
self-adjoint matrices. The properties we find will greatly simplify calcula- 
tions. Let 1 denote the identity matrix and define the Pauli spin matrices 
as follows: 

Definition 1.6 (Pauli spin matrices) 
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These three matrices are self adjoint, each have trace zero and determinant 
minus one, hence have eigenvalues ±1. They satisfy (check this yourself!) 



(3) 
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= 1. 



An arbitrary self-adjoint 2x2 complex matrix has to be of the form 

(4) x= f u + z x-iy\ 

\ x + iy u — z J 

where x, y, z, u are uniquely determined real numbers. Thus we can write 

(5) X = ul + xo x + ycr y + ZG Z . 

Specializing to density matrices, the requirement that trace/) = 1 im- 
poses the condition that u = |. The requirement that p is nonnegative is 
equivalent to its determinant being nonnegative, or u 2 — z 2 — x 2 — y 2 > 0, 
or x 2 + y 2 + z 2 < \ . It is convenient to write 

(6) p = p(a)= l -{l+a-a) 

where a = (a x ,a y ,a z ) £ M 3 and satisfies 

(7) \\a\\ 2 = a 2 x + a 2 y + a 2 < I 

while <t = (cr x ,G y , a z ) (a vector of matrices) and '•' denotes the inner-product. 
Thus the space of density matrices of a two-dimensional quantum system 
can be represented by the closed unit ball B in three dimensional Euclidean 
space. The sphere S, or surface of the unit ball, corresponds to density 
matrices ^(1 + a-a) with ||a|| 2 = 1 which are singular since their determinant 
is zero. Such a density matrix has therefore eigenvalues and 1. It represents 
a so-called pure state. 

The density matrix of a pure state is a projector matrix, projecting onto 
a one-dimensional subspace of C 2 . Letting u denote a unit vector in M 3 , let 
us write II(-u) = p(u) = for this matrix. Check using (3) that II(-u) 

is idempotent, and that n(n) and II(— u) commute (in fact, their product is 
the zero matrix) and add to the identity matrix! Thus the projectors H(u) 
and Il(— u) project onto two orthogonal one-dimensional subspaces of C 2 . 
We will specify these spaces exactly in a moment. The only other projector 
matrices are and 1, projecting onto the trivial subspace and the whole 
space of C 2 respectively. 
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It follows that for an arbitrary density matrix p = p(a), denning the unit 
vector u = a/\\a\\ and the probabilities a = \\a\\, (3 = 1 — a, we have 

p{a) = \{l + a-d) = \\a\\p(a/\\a\\) + (1 - \\a\\)p(-a/\\a\\) 
{ ' = ap{u) + j3p{-u). 

It has eigenvalues a and (3, and its eigenvectors, column vectors in C 2 , gen- 
erate the spaces onto which H(u) and II(— u) project. One may consider 
the state p(a) as the mixture, with probabilities a and (3, of the pure states 
p(u) and p(—u) (though this is only one of many representations of p as a 
mixture of pure states) . 

So what are these spaces exactly? The vector u is a point on the unit 
sphere in M 3 . Let 9 and 4> denote its polar coordinates, where 9 € [0, tt] 
is the latitude measured from the North pole (z-axis) and 4> £ [0, 27r) is 
the longitude, measured from the x-axis. (We should really say co-latitude 
rather than latitude). Thus u = (sin 9 cos </>, sin 9 sin cj), cos 9). Define the 
column vector \ijj) = \tp(9,<fi)} in C 2 by 



(9) |V(M)> 
Note that (ip | ip) = 1 while 



e -i4>/2 cos (6»/2) 
e ^/2 s i n (6i/2) 



cos 2 ((9/2) e _i *cos((9/2)sin(e/2) 



(10) V e^cos(0/2)sin(0/2) sin 2 (#/2) 

x / 1 + cos(#) (cos 4> — i sin 4>) sin I 
2 \ (cos <j) + « sin 0) sin 6* 1 — cos # 

= \{l + u-a) = U(u). 

Any complex vector |£) of length 1 can be written as e ta ^(9,4>) for some 
a € [0, 2tt) and polar coordinates 9, cf). Note that |£) (£| = \ijj) (ip\ = H(u), 
and that \^(9,cf))) and |V>(7r — 9,cp + tt)) are orthogonal. The corresponding 
points on the unit sphere are opposite to one another. Combining these facts 
we obtain: 



Rule 1.1 (Spin-half density matrices, projectors) The density matrix 
p(a), where a is a point in the unit ball in R 3 ; has eigenvalues \\a\\ and 1 — ||a|| 
and normalized eigenvectors \ip(9,(j))), \ip(n — 9,<p + n)), where 9 and 4> are 
the polar coordinates of u = a/\\a\\. The projector matrix H(u) projects onto 
the one- dimensional subspace ofC 2 spanned by \ip(9,(j))). The projector onto 
the space orthogonal to this, spanned by \i/j(tt — 9, (ft + tt)), is II(— u). 
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Let u and v be two unit vectors in M 3 and write \u) and \v) for the 
corresponding unit vectors in C 2 ; so \u) is an abbreviation for \if)(9,<j))) 
where 6, (ft are the polar coordinates of u. Since n(-u) = \u) (u\ we see that 
trace H(u)H(v) = trace \u) (u\ \v) (v\ = (v \ u) (u \ v) = \(u\ v) | 2 . On the 
other hand, using the properties (3) of the Pauli matrices, one readily com- 
putes trace n(u) H(v) = ^(1 + u ■ v). Now u ■ v is the cosine of the angle 
between the vectors u and v, hence ^(1 + u ■ v) is the squared cosine of half 
the angle between u and v. 

Rule 1.2 (Calculation rule) The absolute value of the squared inner pro- 
duct between the complex vectors \u) and \v) in C 2 is the squared cosine of 
half the angle between the corresponding unit vectors u and v in M 3 . In par- 
ticular, opposite points on the unit sphere correspond to orthogonal vectors 
in C 2 . 

We can now describe the probability distributions of all simple measure- 
ments of a spin-half system. 

The state of the system is modelled by a 2 x 2 density matrix of the form 
p(a) = rj(l + a ■ a) where a is a point in the closed unit ball in R 3 . 

The non-trivial simple measurements take on just two different values. 
Consider a simple measurement M = II taking values in a set X consisting 
of just two elements, let's call these elements ±1. The measurement is de- 
termined by the two projectors n(±), which should project onto orthogonal 
one-dimensional subspaces of C 2 . Each subspace is generated by a vector of 
the form \u) for some u on the unit sphere, and the associated projectors 
are II(it). Recall that opposite points ±u on the unit sphere correspond to 
orthogonal vectors |±u) in C 2 , and hence to orthogonal projectors II(±u). 
Thus a projector-valued probability measure for a simple measurement with 
values in X is given by M(±l) = H(±u) = ^(1 ± u ■ a) for some u. 

We apply the trace rule (1) to compute the probabilities of the two 
outcomes ±1 when the simple measurement M(±l) = Il(±'u) is carried out 
on a system in the state p(a) = ^(1 + a ■ a). Using the properties (3) of the 
Pauli matrices, the reader should verify that these probabilities are 

(11) trace p(a)U(±u) = -(l±a-u). 

Using further rules for the state of the system after measurement, it turns 
out that after measurement the system is in the pure state /j(±u) according 
to the outcome ±1. One can therefore go on to compute probabilities of 
the series of outcomes of a series of simple measurements carried out on one 
particle. 

In the Stern-Gerlach experiment, the initial state of the silver atom is 
described by the density matrix p(0) = One can think of this state as 
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corresponding to an electron having spin in a random direction u uniformly 
distributed over the unit sphere. Indeed, if one takes the mean of p(u) = 
^(1 + u ■ <?) with u uniformly distributed over the sphere, the matrix ^1 
results (though this representation of the 'completely random' state p(0) as 
a mixture of pure states is not unique; one also finds this state as the result 
of choosing with equal probabilities \ an electron in either of the orthogonal 
pure states |±u)). 

Exercise 1.3 (A generalised measurement of spin-half system) Let 

M(A) = J A n(-u)dn/27r where du denotes integration with respect to Lebesgue 
surface measure on S. Show that M is a generalized measurement on a spin- 
half system with values in S, and compute the distribution of the outcome of 
this measurement on the system p(a). This measurement would be physically 
realised by somehow coupling the spin-half system with a particle moving on 
the sphere and measuring the position of that particle. 

Exercise 1.4 (A generalized measurement of n spin- half systems*) 

For the state-space (C 2 )®" define \u) n = \u) ® • • • ® \u) and define H n (u) = 
\u) n (u\ n . Define M(A) = (n + 1) f A IL n (u)du/4:ir and show that M(S) is the 
projector onto the n + 1 dimensional subspace of vectors, invariant under 
permutation of the n components of (C 2 )® n . Call this subspace S n and note 
that trace p® n Hs n = 1. Show that M defines a generalized measurement on 
n identical copies of a spin-half system with values in S, and compute the 
distribution of the outcome of this measurement on the system p{v) . 

A Stern-Gerlach magnet oriented in the direction u implements the sim- 
ple measurement M(±l) = II(±u). Since for a = the probabilities (11) 
both equal \ , one will find electrons with spin in the directions ±u with equal 
probabilities. Electrons in the emerging '+' beam are in the pure state p(u). 
Sending them through a Stern-Gerlach device with orientation v splits them 
again, now with probabilities ^(l±u-v) (the squared cosine of half the angle 
between the directions u and v) into two beams of electrons in the states 
p(±w), and so on. 

If the electrons started out in the arbitrary mixed state p(a) then the 
first Stern-Gerlach magnet splits them into two output beams in the pure 
states p(±u) in the proportions ^(l±a-n). So if a was unknown, we do learn 
something about it from counting the numbers of electrons in each beam. 
Further operations on the output beams however will not teach us any more 
as the state of the electrons in either output beam no longer depends on a. 

If we are allowed to measure a large number of electrons each in the same 
mixed state p(a), we see that a large number of Stern-Gerlach measurements 
in three linearly independent directions will enable us to determine a. The 
question we will study in the rest of the paper is: what is the best way to 
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do this? Will it suffice to use simple measurements on separate particles or 
can we do better by using more sophisticated measurements, in particular, 
joint measurements on several particles simultaneously? 

One can consider rotating a given coordinate system in IR 3 in such a 
way as to transform the vectors a and u representing a state or a simple 
measurement into convenient choices, e.g., we will in the future claim that 
'without loss of generality a = (0, 0, 03)' which makes p(a) a diagonal matrix. 
How to do this is given by the following (more difficult) exercise: 

Exercise 1.5 (Rotation of coordinate system*) For given unit-vector 
u and angle 9 define U = exp(— i9u • a/2). Then UU* = U*U = 1, i.e., U 
is a unitary transformation ofC 2 , and Up(a)U* = p(b) where b £ M 3 results 
from a by rotation about u through an angle 6. 

This result really belongs to the representation theory of groups; a major 
topic having deep connections with quantum theory. It is a curious fact that 
if 9 = 2tt the operator U is equal to —1. So though U works on a density 
matrix by a rotation through 360°, it does not transform a state vector to 
itself but to its negative. A rotation through 720° or the angle 47r is needed 
to do this. The fact that two complete revolutions are needed to transform 
a state vector into itself whereas one revolution multiplies the state vector 
by —1 has been experimentally verified through observation of interference 
effects. 

1.3 Quantum Cramer- Rao inequality 

Consider a quantum statistical model whereby the density matrix p depends 
on an unknown parameter 9. Possibly 9 is a vector but we will not emphasize 
that fact in the notation. In particular, a spin-half system has a density 
matrix p = p(a) depending on the vector a in the closed unit ball, which 
we will denote by B. Interesting statistical models could therefore have a 
one-, two- or three-dimensional parameter 6, specifying a curve, a surface, 
or an open region of B. Of particular interest are one- and two-dimensional 
pure-state models models, specifying a curve on the boundary S of the unit 
sphere B and the whole of S respectively. Results are strikingly different 
according to whether the true value of 9 corresponds to a point in S or in 
the interior of B. By a mixed- state model we mean a model in the interior of 
B. By the full model, pure or mixed, we mean the model: 'p is in S\ and 'p 
is in the interior of B 1 respectively. By the natural parametrization of these 
models we mean the parametrization p = p(u), p = p(a) respectively. 3 



3 It would be nice to express conditions and results in the language of differential geom- 
etry, i.e., independent of the specific parametrizations of the models under consideration. 
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The quantum Cramer-Rao bound involves a collection of self- adjoint ma- 
trices Aj called the quantum score matrices, one for each component of 9, 
and a quantum information matrix. These are defined as follows. 

Definition 1.7 (Quantum score matrices) Suppose p = p{9) depends 
on parameters 9 = (9±, . . . ,9^). Suppose that p is differentiate with respect 
to 9 and define self-adjoint matrices Aj = Xi(9) implicitly by the equation 

(12) Pl = ^ = \(hp + pK)- 

Note that the Aj = \i{9) will also depend on 9. Another name for these 
matrices is the symmetric logarithmic derivatives of p with respect to 9. If 
p and its derivative pi with respect to 6>j commute, then Aj is nothing else 
than the derivative of log p. By using a basis of C d making p diagonal, 
p = J2Pj \j) one can solve (12) to obtain 

(13) (j | A, | f) = 2 <J I I A 

Pj+Pji 

If some pj are zero the corresponding elements of Aj may be chosen arbitrarily 
(subject to self-adjointness) without effect on subsequent calculations. If p 
is a pure state, then p 2 = p and it follows from differentiating this equation 
with respect to 9i that in this case Aj = 2pi. 

Exercise 1.6 (mean quantum score zero) Show that the quantum score 
has expectation zero, that is, the distribution of a measurement of the ob- 
servable Aj has mean zero, or trace(pAj) = 0. 

Exercise 1.7 (Spin half, mixed) Consider the full mixed-state spin-half 
model d = 2, p = ^(1 + 9 ■ a), where 9 is three-dimensional and satisfies 
< 1- Then pi = o~i for each i. At the point 9 = (0,0, £) the density 
matrix is diagonal with diagonal elements ^(1 ± £) and the quantum scores 
are found from (13) to be o~ x , o~ y and (1 — i)~ 2 (— £1 + o~ z ). 

Exercise 1.8 (Spin half, pure) The full pure-state spin-half model has 
everything as in the previous exercise but now with Yli^l = 1- ^ ^ wo ' 
dimensional parametrization is called for, using, e.g., the polar coordinates 
of the unit vector 9. However on the Northern hemisphere we can stick to 
9 = (0i, 2 ) with 6» 3 = +(1 -9\- 9l) 1 / 2 and we find that at 9 = (0,0) the 
quantum scores are a x and a y . 

Exercise 1.9 (n copies) Suppose p^ n \9) = p® n (9). Then the quantum 
scores are given by 

(14) \ { p ®l---<g>l + -- - + l<g>---®l<g> Aj 1 } . 
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Now we can define the quantum information matrix and state the original 
quantum Cramer-Rao bound. 

Definition 1.8 (Quantum information matrix) The quantum inform- 
ation matrix Iq is defined by 

(15) (Iq)u' = ^trace(/9(AjAj' + Aj/Ai)). 

Check that this defines a real, positive semi-definite matrix! 

Exercise 1.10 (n copies, continued) Show from (14) and Exercise 1.6 
that the quantum information Iq^ for a parameter 9 in the system p® n is 
just n times the quantum information for 9 in a single copy of the system. 

Theorem 1.1 (Quantum Cramer-Rao bound) Define Im{6) to be the 
Fisher information matrix for the parameter 9 in the distribution of the 
outcome of a measurement M on the quantum system p{&). Then (with 
respect to the usual ordering of symmetric positive semi-definite matrices) 

i M (e)<i Q (9). 

The result in this form was proved by Braunstein and Caves (1994) for a 
one-dimensional parameter, but the general result is an easy consequence 
of this by considering the information for arbitrary linear combinations. As 
a corollary one obtains Helstrom's original form of the theorem as a lower 
bound to the variance of an unbiased estimator of 9 based on the outcome 
of an arbitrary measurement M. 

The proof is just as for the ordinary Cramer-Rao bound, an exercise 
in using the Cauchy- Schwartz inequality, but now with the complex inner- 
product trace X*Y between two self-adjoint matrices. And just as in the 
usual proof of the Cramer- Rao inequality, as a by-product the proof shows 
that equality holds, for a one-dimensional parameter, if (though not quite if 
and only if) M is actually a simple measurement of the observable A: 

Exercise 1.11 (Optimal M for 1-d 9) Show for one- dimensional 9 that 
if M is the simple measurement of the observable A, i.e., its values are in 
one-to-one correspondence with the eigenspaces of A and each M(x) is the 
projection onto the corresponding eigenspace, then Im = Iq- 

There is a complication when using this result. Typically A will depend 
on 9, and typically in such a strong way that the eigenspaces of A (and not 
just eigenvalues) depend on 9. Thus the best measurement of 9 in terms of 
Fisher information depends on the true value of 9. However things are very 
simple in the following example: 
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Exercise 1.12 Suppose all p(9) commute, i.e., have common eigenspaces. 
Show that the Pi(0) then also commute for all i and 9. Show that a simple 
measurement of the common eigenspaces of all these matrices has Fisher 
information equal to the quantum information for all values of 6. 

The above is actually a completely classical model where p = Y2iPi(@) K) (*l> 
i.e., a classical mixture with mixing distribution depending on 9 of the fixed 
pure states The optimal measurement is to measure which of these pure 
states the system is in; that can best be done using the projector-valued 
probability measure with elements \i) (i\ resulting in the outcome l V with 
probability Pi{9). The quantum information matrix is the Fisher information 
matrix for this distribution. 

The result of Exercise 1.11 does gives a lot of hope for a clear solution to 
the problem of estimating a one-dimensional parameter, at least, for large 
n, for the system p® n (a(9)), as was first pointed out by Barndorff-Nielsen 
and Gill (1998). Suppose the parameter 9 is identified, so that there are a 
finite number of simple measurements, the distributions of whose outcomes 
identifies 9. For example, in the spin-half case p = ^(1 + 9 -a), measurements 
of <7i, 02, o"3 result in Bernoulli trials with probabilities ^(1 ± 9i). Suppose 
that from consistent estimators of the ai(9) we can construct a consistent 
estimator of 9. Now, use a growing number but vanishing proportion of 
copies of our quantum system with which to 'pre-estimate' 9 consistently. 
Call this preliminary estimator 9. Now, compute the quantum score for 9 
at 9, determine its eigenspaces, and implement the corresponding simple 
measurement on all remaining copies^of the system. This gives us an i.i.d. 
sample from some distribution p(-\9; 9). Estimate 9 by maximum likelihood 
on these observations conditional on the observed value of 9. The result 9 will 
be an estimator approximately normally distributed about 9 with variance 
approximately l/nl(9; 9) where 1(9; 9) is the Fisher information for 9 in one 
of these observations, given 9. Now for n large we have arranged that 9 
is close to the true value of 9. We may hope that the eigenspaces of X(9) 
are close to the eigenspaces of X(9) and hence that the Fisher information 
in one observation (one simple measurement) of X(9) is close to that in one 
observation of X(9). But the latter achieves the quantum Cramer- Rao bound 
at 9. Thus under suitable smoothness conditions 1(9; 9) will be close to Iq(9) 
and hence the asymptotic distribution of our final estimator close to normal 
about 9 with variance 1/uIq(9). This is coming close to saying that 9 is 
asymptotically optimal. 

We know that no unbiased estimator of 9 can have smaller variance. 
However that does not tell us no estimator whatever can do better, e.g., in 
terms of mean square error. Indeed the phenomenon of super-efficiency is 
just as present here as in ordinary statistics. In order to make a compelling 
optimality statement about our estimator we must either restrict attention 
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to a sub-class of nicely behaved estimators, or make optimality statements 
which are of a Bayesian or a minimax nature. A very useful tool, which 
can be used in any of these approaches, is the van Trees inequality which 
says for a one-dimensional parameter 9 with prior distribution ir(d0), under 
some regularity conditions, that the expected (with respect to the prior) 
mean square error of a completely arbitrary estimator of 9 is bounded by 
one divided by the expected Fisher information for the parameter plus the 
information, with respect to location, in the prior distribution. This writer 
prefers to restrict the class of estimators according to some regularity con- 
dition. We will go into this in more detail in the next section, but before 
that, let us consider the multiparameter case. We will see that a more fun- 
damental complication arises: at a fixed parameter value, quantum scores 
for different components of the parameter may not commute. 

Exercise 1.13 (Quantum information for spin-half models) In exer- 
cises 1.8 and 1.7 we noted the score matrices for the full pure-state model 
p = p{u) and for the full mixed-state model p = p(a). Show that, at 
u = (0,0,1) in the first case and at a = (0,0, £) in the second case, the 
quantum information matrices for 9 = (u±,U2) and for 9 = a are respec- 
tively 



Now the approach just sketched for the one-parameter case breaks down. 
Certainly we can form a preliminary estimator of 9 and thereby 'estimate' 
the quantum score matrices. Next, in the full pure- and mixed-state models, 
one can rotate the coordinate system and reparametrize so that the quantum 
scores become a x , a y (pure-state model) , and a x ,a y ,a + ba z (mixed state 
model). There is no way, in either model, we can simultaneously measure 
these observables since they do not commute. Thus no measurement on a 
single particle has an information matrix equal to Iq. The big question is, 
what is the class of information matrices Im which are available? And if we 
can perform measurements on the system obtained by combining particles, 
what scaled information matrices iff jn become available? The latter class 
includes all of the former class, since the joint measurements include n i.i.d. 
copies of measurements on separate particles; moreover these classes are 
convex and bounded. 

Though all scaled information matrices iff jn are bounded by Iq, we 
cannot expect them, for given n, to contain a single 'best' information. 
Which measurement we should choose will depend on the relative accuracy 
with which we want to estimate the different components of 9. For instance 
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if in the pure-state case, close to 9 = (0, 0), we are only interested in 9\ we 
should simply measure o x on each of our n particles yielding the maximum 
information on 9\ but no information at all on 92- After we have character- 
ized the class of all information matrices available, we must specify through 
some loss function the relative importance of the different parameters and 
solve some optimization problem. 

2 A new Cramer-Rao type bound 

In this section we report on recent results of Gill and Massar (1998), concen- 
trating on the spin-half situation, and within that case, emphasizing the full 
pure-state model and the full mixed-state model. There turns out to be a 
striking difference between these two cases. For pure states, there is asymp- 
totically no advantage in joint measurements on many particles. However 
for mixed states there typically is an advantage. How much is still an open 
question. The following result should be called a 'Theorem' (in quotes) since 
we do not specify regularity conditions and indeed only a 'Proof exists, not 
yet a Proof. 

Theorem 2.1 (Achievable information matrices, n = 1) The set of all 

information matrices of outcomes of measurements of one spin-half particle 
for a smooth model p(9) is {F : trace(/g 1 F) < 1}. 

The parameter 9 could be one-, two- or three-dimensional. We suppose 
either that we have a pure-state model, or a strictly mixed-state model. The 
argument, in Gill and Massar (1998), has two main parts. In the first part 
we show that for all M, F = Im satisfies trace(/Q 1 F) < d — 1 (we do not 
yet need that d = 2). In the second part we show that, when d = 2, for any 
F satisfying this inequality one can construct a measurement M for which 
Im = F. For d > 2, not all F satisfying trace(Ig 1 F) < 1 are achievable, and 
it remains open to characterize exactly the class of achievable information 
matrices. 

For the first part a series of preparatory steps are taken to bring us, 
'without loss of generality', to a situation that allows exact computations. 
For simplicity take d = 2. If p{9) lies in the interior of the unit ball, and 9 has 
dimension one or two, one can augment 9 with other parameters, raising its 
dimension to 3. This can be done in such a way that the cross-information 
elements in the augmented Iq(9) are all zero. It then suffices to prove the 
inequality for 9 of dimension 3, and then we may as well use the natural 
parametrization p{9) = ^(1 + 9 ■ a) with \\9\\ < 1 since the the quantity 
trace(/g 1 i ? ) is invariant under smooth reparametrization. If on the other 
hand p{9) is a pure state model we can in the same way after augmenting 
9 assume that 9 has dimension 2 and after reparametrization the model is 
p(9) = 1(1 + 0.^) with ||0|| = 1. 
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For the next preparatory step we need the concepts of refinement and 
coarsening of a measurement. 

Definition 2.1 (Coarsening and refinement) A measurement M with 
sample space X is a refinement of M' with sample space y ( and M' is a 
coarsening of M) if a measurable function f : X — ► y exists with M'(B) = 
M{f-\B)). 

The result of measurement of M' then has the same distribution as taking / 
of the outcome of measurement of M. It follows that the Fisher information 
in the outcome of M' is less than or equal to that in M since under coarsening 
of data, Fisher information can only decrease. 

Now we show that any measurement M' has a refinement M for which 
M{A) = J A M(x)fi(dx) for some nonnegative operator-valued function M 
and bounded measure fi and for which M(x) has rank one for all x, thus 
M{x) = \ip(x)) {ip(x)\ for some (not necessarily normalised) vector function 
\ip(x)). Consequently it will suffice to prove the result for such maximally 
refined measurements M. Start with the measurement M' with sample space 
y. Define a probability measure v on y by v(B) = trace(M'(.B))/d; by 
taking Radon-Nikodym derivatives one can define M'{y) such that M'{B) = 
f B M'(y)u(dy). Since the rank of M'(y) is finite, M'{y) = Y.i M i(y) where 
each Mi(y) has rank one. Now refine the original sample space y to X = 
yx{l,...,d}, defining M(A x {i}) = j A M i (y)v(dy). Equivalently M(A) = 
j A Mi(x)fj,(dx, di) where p is the product of v with counting measure. 

This brings us to the situation where the model is either full pure-state 
or full mixed-state, and where the measurement is maximally refined. We 
take the natural parametrization of either of these models, and without 
loss of generality work at a point 9 where 9 = (0,0) or (0,0, £). This is 
possible by the result of Exercise 1.5. Now we have a formula for Iq and 
for the derivatives of p with respect to the components of 9, both in the 
pure and the mixed case, and we have a representation for M in terms of 
a collection of vectors ifj(x) which must satisfy the normalization constraint 
f x \ip(x)) (i/)(x)\ fi(dx) = 1 but which are otherwise arbitrary. Both p and Iq 
are diagonal. We simply compute trace Iq 1 Im and show that it equals 1 in 
the case d = 2. We leave the details as an exercise for the diligent reader — the 
computation is not difficult but does not seem all that illuminating either. 
We would dearly like to know if there a more insightful way to get this result! 

The same arguments work for arbitrary d though the details are more 
complicated; a full mixed-state model has ^d(d+ 1) parameters, a full pure- 
state model ^d(d + 1) — (d — 1) parameters, and a careful parametrization 
is needed to make Iq diagonal. 

In the second part (for d = 2 only) it is shown that for any F satisfying 
trace(/ (? 1 F) < 1, one can construct a measurement M for which Im = F. 
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This measurement will be described in the next section. It typically depends 
on the point 9 so a multi-stage procedure is going to be necessary to achieve 
asymptotically this information bound. That will be the main content of the 
next section, where we do some quantum asymptotics proving asymptotic 
optimality results for n — > oo of the resulting two-stage procedure. 

We only have partial results for n > 1. In two special cases the available 
scaled information matrices do not increase as n increases. One of these 
cases is the case of pure-state models. This case has been much studied in 
the literature and is of great practical importance. The other case is when we 
make a restriction on the class of measurements to measurements of product 
form (in the literature also sometimes called an unentangled measurement). 
We first define this notion and then explain its significance. 

Definition 2.2 (Product-form measurements) We say that a measure- 
ment on n copies of a given quantum system is of product form if MW(4) = 
J A M^ n \x)/J,(dx) for a real measure \i and matrix-valued function M^ n \x) 
where M^fi) is of the form M\(x) <g) ■ ■ ■ <8) M n (x), with nonnegative com- 
ponents. 

We described in the previous section a measurement procedure whereby 
we first carried out measurements on some of our n particles, and then de- 
pending on the outcome, carried out other measurements on the remaining 
particles. Altogether this procedure constitutes one measurement on the 
joint system of n particles taking values in some n-fold product space. One 
can conceive of more elaborate schemes where depending on the results at 
any stage, one decides, possibly with the help of some classical randomi- 
sation, which particle to measure next and how. It would be allowed to 
measure again a particle which had previously been subject to measure- 
ment. There exists a general description of the state of a quantum system 
after measurement, allowing one to piece all the ingredients together into one 
measurement of the combined system. A measurement which can be decom- 
posed into separate steps consisting of measurements on separate particles 
only, is called a separable measurement. 

It turns out that all separable measurements (provided all outcomes of 
the component steps are encoded in the overall outcome x) have product- 
form. On the other hand, product-form measurements exist which are not 
separable, see Bennett et al. (1998). The product-form measurements form a 
large and interesting class, including all measurements which can be carried 
out sequentially on separate particles as well as more besides. 

In the notion of separable measurement it is insisted that all intermediate 
outcomes are included in the final outcome. If one throws away some of the 
data, one gets an outcome whose distribution is the same as the distribution 
of a coarsening of the original measurement. Coarsening of a measurement 
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can easily destroy the properties of being separable or being of product-form. 
This is some explanation for the complicated restriction to measurements 
which can be refined to product-form in the following theorem: 

Theorem 2.2 (Achievable information matrices n > 1) The scaled in- 
formation matrices of measurements on a smooth model p® n {9) remain {F : 
trace(/ Q 1 F) < 1} 

1. in a pure-state spin-half model; 

2. in a mixed-state spin-half model with the class of measurements re- 
stricted to measurements which can be refined to product- form, 

The theorem is proved exactly as before, again finishing in an unilluminating 
calculation. 

We have a counterexample to the conjecture that, for mixed states, the 
bound holds for all measurements. In the case n = 2, at the point p = ^1, 

— 1 (2) 

there is a measurement for which trace(ig I M /2) = 3/2, thus 50% more 
information in an appropriate measurement of two identical particles than 
any combination of separate measurements of the two. What the set of 
achievable scaled information matrices looks like and whether it continues 
to grow (and to what limit) as n grows is completely unknown. 

The measurement has seven elements, the first six of the form ^n^j , and 
the seventh n^j. The various tp are \+z + z), |— z — z), \+x + x), \—x — x), 
\+U + y), \—y - y), \S). By \+z + z) we mean \+z) (g> \+z) = ip{e z ) (g> ip(e z ) 
and similarly for the next five. The last ip is the so-called singlet state 
-^(l+z) <£> |— z) — \—z) \+z)). As a pure state of two interacting spin-half 
particles, this is the famous entangled state resulting in the violation of the 
Bell inequalities, and hence of locality (according to some interpretations). 
Here it arises as part of a measurement of two completely non-interacting 
particles; however this measurement can never be implemented by doing 
separate operations on the separate particles. 

Similar examples occur in the paper of Vidal et al. (1998), extending the 
pure-state results of Massar and Popescu (1995) to mixed states. 

3 Quantum asymptotics 

The results of the previous section are in the form of a bound on the informa- 
tion matrix based on the outcome of any measurement (perhaps restricted 
to the class of product-form measurements) on n identical copies of a given 
spin-half quantum system with state depending on an unknown parameter 
9. We will now explain how such a bound can be used to give asymptotic 
bounds on the quality of estimators based on those measurements. Further- 
more, we show how the bounds can be achieved by a two-stage procedure 
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using simple measurements on separate particles only. As far as achieving 
the bounds is concerned, only for the full mixed-state model under the natu- 
ral parametrization is the problem completely solved. For the other models, 
the results are conjectural. 

We will discuss two kinds of bounds: firstly, a bound on the limiting 
scaled mean quadratic error matrix of a well-behaved sequence of estima- 
tors, and secondly, a bound on the mean quadratic error matrix of the lim- 
iting distribution of a well-behaved sequence of estimators. Each has its 
advantages and disadvantages. In particular, since the delta-method works 
for (the variance of) limiting distributions but not for limiting mean square 
errors, stronger conditions are needed to prove optimality of some procedure 
in the first sense than in the second sense. 

3.1 Two asymptotic bounds 

Obviously a bound on the information matrix, by the ordinary Cramer- 
Rao inequality, immediately implies a bound on the covariance matrix of 
an unbiased estimator. However this is not a restriction we want to make. 
It turns out much more convenient to work via a Bayesian version of the 
Cramer- Rao inequality due to van Trees (1968), as generalised to the multi- 
parameter case by Gill and Levit (1995). For a one-dimensional parameter 
the van Trees inequality is easy to state: the Bayes quadratic risk is bounded 
by one over expected information plus information in the prior. In the 
multiparameter case one has a whole collection of inequalities corresponding 
to different choices of quadratic loss function and some other parameters, 
more difficult to interpret. 

Let it (6) be a prior density for the p-dimensional parameter 9, which we 
suppose to be sufficiently smooth and supported by a compact and smoothly 
bounded region of the parameter space; see Gill and Levit (1995) for the 
precise requirements. Let C{9) beapxp symmetric positive definite matrix 
(C stands for cost function) and let V^{9) be the mean quadratic error 
matrix of a chosen estimator of 9 based on a measurement of n copies of 
the quantum system. Letting denote a random drawing from the prior 
distribution it, it follows that E trace C(0)V^(0) is the Bayes risk of the 
estimator with respect to the loss function (#( n ) — 8) T C(9)(9^ — 9). 

Let D(9) be another p x p matrix function of 9. Let 1^(9) denote the 
Fisher information matrix in the measurement. Then the multivariate van 
Trees inequality reads 



(17) 

EtraceC(6)nV^ t) (e) > 



E trace C(6)- 1 J D(e)(/^ t) (0)/n)D(0) T + J(vr)/n 



(E trace £>(6)) 2 
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where 



(is) j(vr) = J ^Y. c T^^ k {D i m<o))^{D j m<e))^ 



On invoking Theorem 2.2 we have the bound trace Iq 1 (9){If^' (9) / n) < 1, 
(provided that, in the mixed case, we restrict attention to measurements 
refinable to product-form). We are going to assume that our sequence of 
measurements and estimators is such that the normalized mean quadratic 
error matrix V^\6) converges sufficiently regularly to a limit V{9). Our 
aim is to transfer the just mentioned bound to V obtaining the bound 
trace /^(fl)^) -1 < 1. 

We will do this by making appropriate choices of C and D. We will need 
regularity conditions both on the sequence of estimators and on the model 
p(0) in order to carry over equation (17) to the limit. 

Theorem 3.1 (Asymptotic Cramer- Rao 1) Suppose that on some open 
set of parameter values : 

1. raV( n ) converges uniformly to a continuous limit V . 

2. Iq{9) is continuous with bounded partial derivatives. 

3. V and Iq are non-singular. 

Then the limiting normalised mean quadratic error matrix satisfies 



We outline the proof of the theorem as follows. First of all, we pick a 
point #0 and define Vq = V(9q). Next we define 



(19) 



trace Iq 1 {9)V{9)~ 1 < 1. 



(20) 



C(9) = V - 1 Iq\9)V 



o 1 > 



(21) 



D(9) = V - 1 Iq\6). 




Etrace/ Q (0)- 1 (/£ ) (e)/n) + T(vr)/n 



(E trace V Q -%\Q)) 2 



We can bound the first term in the denominator of the right hand side by 
1, by the results of the last section. The second term in the denominator 
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of the right hand side is finite, by our third assumption, and for n — > oo it 
converges to zero. By our first assumption (22) converges to 

(23) Etr&ceV^ 1 Iq 1 (Q)V ~ 1 V (Q) > (E trace V^ 1 Iq 1 (G)) 2 . 

Now replace the prior density ir by one in a sequence of priors, concen- 
trating on smaller and smaller neighbourhoods of 6q. Using the continuity 
assumptions on V and Iq, we obtain from (23) the inequality 

trace V^Iq 1 (9 )V 1 V > (trace V Q l I Q \9)) 2 . 

or in other words, with 9 = 9q, the required 

(24) trace I Q 1 (tf)^ 1 (0) < 1. 

In some situations it might be more convenient to have a bound on the 
mean quadratic error of a limiting distribution, assuming one to exist. At 
the moment of writing we believe the following: 

Theorem 3.2 (Asymptotic Cramer-Rao 2) Suppose 

1. 6 n is Hdjek regular at 9 at root n rate. 

2. If Z has the limiting distribution of ^/n(9 — 9), then the mean quadratic 
error matrix of the limiting distribution V = E(Z Z T ) is non-singular. 

3. Iq is non-singular. 
Then V satisfies 

(25) trace Iq 1 {9)V{e)~ 1 < 1. 

The proof should follow the lines of the similar result in Gill and Levit 
(1995), with a prior distribution concentrating on a root n neighbourhood 
of the truth. We will need similar choices of C and D as in the proof of 
Theorem 3.1 though the dependence of D on 9 can now be suppressed. 

3.2 Achieving the asymptotic bounds 

At present we have essentially complete results in the full mixed-state spin- 
half model with the natural parametrization. We believe they can be ex- 
tended to smooth (C 1 ) pure- and mixed-state models. 

Give yourself a target mean quadratic error matrix W(9) satisfying 



(26) 



trace I Q (9)- l W (9)- 1 < 1. 
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Is there a sequence of measurements satisfying the conditions of Theo- 
rems 3.1 or 3.2 with limiting mean quadratic error matrix V(0) equal to the 
target? 

Possibly we do not start with a target W but with a step earlier, with 
a quadratic cost function. For given C(9) it is straightforward to compute 
the matrix W{6) which minimizes trace C(6)W(6) subject to the constraint 

(26); the solution is W = tmce((I Q 2 CI Q 2 )^)I Q 2 (I^CI 2 )^ I Q 2 . Now we 
pose the same question again, with the W we have just calculated as target. 

Let us call F = W^ 1 the target information matrix. First we pretend 6 
is known and exhibit a measurement M on a single particle with the target 
information matrix at the given parameter value. 

In the previous section we omitted explaining how the bound of Theorem 
2.1 can be attained. That theorem stated that, at a given parameter value, 
for any positive-semidefinite symmetric F satisfying trace Iq 1 F < 1 there 
is a measurement M on a single spin-half particle with Im = F. What is 
that measurement? We describe it in the case of a full mixed-state spin- 
half model with the natural parametrization, thus p{6) = ^(1 + ■ a). The 
matrices Iq and F are 3x3. 

To start with, we compute the eigenvector-eigenvalue decomposition of 

_i _i ' 

Iq 2 FIq 2 , obtaining eigenvectors hi and nonnegative eigenvalues ji , say. 

i _ 

The condition on F translates to Y^li < 1- Now define = Iqhi and 
three unit vectors Uj = gi/\\gi/\\, and finally consider the measurement M 
taking seven different values, whose elements are 7jII(±Uj), i = 1,2,3, and 

(i-E^)i- 

It turns out by a staightforward computation (carried out, without loss of 
generality, at 9 = (0, 0, £)) that the information matrix for the measurement 
with the two elements il(±Uj) has information matrix <g> gi and hence the 
measurement M has information matrix ^ 7^ (g> gi = F. 

This seven-outcome measurement can be implemented as a randomized 
choice between three simple measurements: with probability 7, measure spin 
in the direction Uj , with probability 1 — Yl 1% do nothing. 

However in practice this measurement is not available since the directions 
Hi and probabilities 7, depend on the unknown 6. We therefore take recourse 
to the following two-stage measurement procedure. 

First measure spin in the x, y and z directions on |n a each of the par- 
ticles, where < a < 1 is fixed and the numbers are rounded to whole 
numbers. The expected relative frequency of 'up' particles in each direc- 
tion is ^(1 + 6i), i = 1,2,3, so solving observed equals expected yields a 
consistent preliminary estimator 6 of 9. If the estimate lies outside the unit- 
ball project onto the ball and stop. With large probability no projection 
is necessary. We can compute the eigenvalue-eigenvector decomposition of 
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_I ~ ~ _i ~ 

Iq 2 (6)F(9)Iq 2 (9), leading to fractions 7^ and directions Ui as above. Mea- 
sure the spin of a fraction 73 of the remaining particles in the direction . 
Solve again the three (linear) equations 'observed relative frequency equals 
expected' treating the Ui as fixed. Project onto the unit ball if necessary, 
yielding an estimator 9. 

Our claim is that this procedure exhibits a measurement on the 

n particles, and an estimator 9^ based on its outcome, which satisfies the 
conditions of Theorem 3.1, with V(9) equal to the target W{9). Thus the 
bound of Theorem 3.1 is also achievable, and a measurement which does 
this has been explicitly described above. Apart from projecting onto the 
unit ball the estimator involves only linear operations on binomial variables 
so is not difficult to analyse explicitly. We need a preliminary sample size 
n of order n a and not, for example, of order logn, in order to control the 
scaled mean quadratic error of the estimator. There is an exponentially 
small probability — in n, not in n — that the preliminary estimate is outside 
of a given neighbourhood of the truth, and hence that the scaled quadratic 
error is of order n. 

One can further check that the estimator we have described also satisfies 
the conditions of Theorem 3.2. 

Possibly one is interested in a different parametrization of the model. Un- 
der a smooth (C 1 ) reparametrization, the delta method allows us to maintain 
optimality in the sense of Theorem 3.2. However optimality in the sense 
of Theorem 3.1 could be destroyed; in order for it to be maintained the 
reparametrization should also be bounded. Alternatively one must modify 
the estimator by a truncation at a level increasing slowly enough to infinity 
with n, cf. Schipper (1997; section 4.4) or Levit and Oudshoorn (1993) for 
examples of the technique. 

This approach can be extended to other spin-half models. The difficulties 
are exemplified by the case of the two-parameter full pure-state spin-half 
model. Locally, consider the natural parametrization 9 = (#i,#2)> #3 = 
(I — 9\ — 9^) x l 2 , p = p(9) at the point 9 = (0, 0). The quantum information 
matrix for three parameters #1,6*2, #3 contains an infinite element. However 
the recipe outlined above continues to work if we add to a given 2x2 
target information matrix a third zero row and column — infinities always 
get multiplied by zero. The third fraction 73 = so simple measurements in 
just two directions suffice. 

The resulting procedure involves linear operations on binomial counts, 
projecting onto S, and reparametrization. Under some smoothness we should 
finish with an estimator optimal in the sense of Theorem 3.2; under further 
smoothness, boundedness, and a sufficiently large preliminary sample also 
optimality in the sense of Theorem 3.1 should hold. 

If the target information matrix includes some zeros, i.e., one is not 
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interested at all in certain parameters, the results should still go through; 
the preliminary sample should be of size of order n a , \ < a < 1, in order 
that the uncertainty in the initial estimate of the 'nuisance parameters' does 
not contaminate the final result. 

4 Non-locality without entanglement 

It would take us too far afield here to explain the notions of entanglement and 
of non-locality. For some kind of introduction see Kummerer and Maassen 
(1998) and Gill (1998), and Gill (1995a, 1995b); see also the books of Peres 
(1995), Isham (1995), Penrose (1994), Maudlin (1994). However we would 
like to discuss whether or not our finding, that non-separable joint measure- 
ments on several independent (non-entangled) quantum particles can yield 
more information that any separate measurements on the separate particles, 
should be considered surprising or not. Recall that separable measurements, 
cf. Bennett et al. (1998), are measurements which can be decomposed into a 
sequence of measurements on separate particles, each measurement possibly 
depending on the outcome of the preceding ones, and whereby it is allowed 
to measure further a particle which has already been measured (and hence 
its state has been altered in a particular way) at an earlier step. 

,/From a mathematical point of view there should not be much surprise. 
The class of separable measurements is contained in the class of product- 
form measurements, which is clearly a very small part of the space of all 
measurements whatsoever. The optimisation problem of maximising Fisher 
information (more precisely, some scalar functional thereof) must only be 
expected to have a larger outcome when we optimise over a larger space. 
The surprise for the mathematician is rather that for pure states, and for 
one dimensional parameters, there is no gain in joint measurements. And it 
is strange that mixed states should exhibit this phenomenon whereas pure 
states do not: the differenence is classical probabilistic mixing which should 
not lead to nonclassical behaviour. 

However physicists are and should be surprised. The reason is connected 
to the feeling of many physicists that the randomness in measurement of a 
quantum system should have a deterministic explanation (Einstein: "God 
does not throw dice" ) . We appreciate very well that tossing a coin is essen- 
tially a completely deterministic process. It is only uncontrolled variability 
in initial conditions which lead to the outcome appearing to be completely 
random. Might it be the case also that the randomness in the outcome 
of a measurement of a quantum system might be 'merely' the reflection of 
statistical variability in some initial conditions? So-called hidden variables 
because at present no physicist is aware what these lower level variables are 
and there is no known way directly to measure them? 

In fact there already exist arguments aplenty that if there is a determinis- 
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tic hidden layer beneath quantum theory, it violates other cherished physical 
intuitions, in particular the principle of locality; see again Kiimmerer and 
Maassen (1998), Gill (1998) for some introduction to the phenomenon of 
entanglement, and further references. But let us ignore that evidence and 
consider the new evidence from the present results. Consider two identical 
copies of a given quantum state. Suppose there were a hidden deterministic 
explanation for the randomness in the outcome of any measurement on either 
or both of these particles. Such an explanation would involve hidden vari- 
ables L0±, ll>2 specifying the hidden state of the two particles. Since applying 
separate measurements to the two systems produces independent outcomes, 
and since the outcomes of the same measurements are identically distributed, 
one would naturally suppose that these two variables are independent and 
identically distributed. Their distributions would of course depend on the 
unknown parameter 6. Now when we measure the joint system, there could 
be other sources of randomness in our experiment, possibly even quantum 
randomness, but still it would not have a distribution depending on 9. So 
let us assume there is a third random element ojm such that the outcome of 
the measurement M on the system p{9) (g) p{9) is a deterministic function 
of lji, L02 and ojm] the first two are independent and identically distributed, 
with marginal distributions depending on 9, while the distribution of cum 
given the other two is independent of 9. Thus the random outcome X of the 
measurement of M is just X(uji,U2,ojm), a random variable on the prob- 
ability space (f2 x Q x Qm), ((P# x Pfl) * Pm) where Pm is some Markov 
kernel from 17 x Q to Qm ■ Now it is well-known from ordinary statistics that 
the Fisher information in 9 from the distribution of any random variable 
defined on this space is less than twice the information in one observation 
of uj\ itself seen as a random variable defined on (f2,P#). Thus if one could 
realise any £Im, Pm and any X whatsover by suitable choice of measurement 
M, achievable Fisher information would be additive! 

What can we conclude from the fact that achievable Fisher informa- 
tion is not additive? We cannot rule out hidden variable models such as 
the above. But apparently, the hidden variables are so well hidden that 
we cannot uncover them from any measurements on single particles, i.e., 
it is not possible to realise any (£Im,Pm) and any X whatever by appro- 
priate choice of experimental set-up. However we can uncover the hidden 
variables better, apparently, from appropriate measurements on several par- 
ticles brought together, even though these particles have nothing whatever 
to do with one another — their hidden variables are independent and iden- 
tically distributed. Alternatively the explanation must be found in some 
pathological non-measurability or non-regularity of the statistical model we 
have just introduced. Whatever escape-route one chooses, it is clear that 
if there is a deterministic explanation for quantum randomness, it is a very 
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